Physiological Measurement — Latest Matching Preprints

1

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv

Top 0.1%

6.4%

Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

2

Deep learning optimisation for cardiology: Neural Architecture Search-driven arrhythmia classification with electrocardiograms

Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.

2026-05-30 cardiovascular medicine 10.64898/2026.05.28.26354348 medRxiv

Top 0.1%

4.9%

Show abstract

Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.

3

The Sleep-Wake Classification Performance of Pediatric-Trained Machine Learning Algorithms for Raw Accelerometer Data

Chen, P.-W.; Cielo, C.; Walsh, O.; Mcdonald, M.; Song, P. X.; Goldstein, C.; Moreno, J. P.; Jansen, E.; Mitchell, J. A.

2026-06-01 pediatrics 10.64898/2026.05.28.26354364 medRxiv

Top 0.1%

4.0%

Show abstract

Introduction: Actigraphy sleep-wake classification methods increasingly seek to leverage raw acceleration data and machine-learning-based classification, but performance evaluation in pediatrics is limited. We trained machine-learning models using pediatric data and compared their sleep-wake classification performance with existing algorithms for children. Methods: Sixty-five children (46% female, ages 5.3 to 17.7 years) completed in-lab overnight polysomnography and wore a GENEActiv device on their non-dominant wrist. The acceleration data were converted into 30-second epochs and aligned with physician-scored sleep-wake data from electroencephalography. Seven machine-learning models were trained using leave-one-subject-out cross-validation. Epoch-by-epoch analyses generated performance metrics (e.g., balanced accuracy [BA]) and discrepancy analyses provided overall sleep duration bias estimates. The combination of highest performance and least bias was used to rank using Euclidean distance scores - where a lower score represents closer to perfect performance and zero bias. For benchmarking, we included GGIR sleep scoring algorithms and an adult trained random forest classifier. Results: Overall, 560.1 hours of polysomnography and actigraphy data were collected (74.4% of epochs were scored as sleep). The pediatric-trained local-global long-short term memory (LSTM) classifier had the most optimal epoch-by-epoch performance (e.g., BA=0.85, sensitivity=0.88, specificity=0.83, ROC-AUC=0.95, and Cohen kappa=0.67). These metrics exceeded that of an adult-trained random forest classifier and GGIR-based algorithms. Discrepancy analyses revealed that overall sleep duration was underestimated by an average of 25 minutes using the LSTM classifier with no proportional bias. Conclusion: We trained seven pediatric sleep-wake classifiers that had strong ability to detect sleep and wake, with the LSTM classifier being most optimal.

4

Wilson's Central Terminal Changes Location on the Body Surface During the P-Wave: Why Precordial Leads Might Not Be What We Think

Bender, J.; Stoks, J.; Barrios Espinosa, C.; Becker, S.; Cluitmans, M. J. M.; Loewe, A.

2026-05-28 cardiovascular medicine 10.64898/2026.05.20.26352966 medRxiv

Top 0.2%

1.8%

Show abstract

Background and Aims: Clinical interpretation of the precordial leads V1-V6 assumes that Wilson's central terminal (WCT) has a fixed anatomical location. Consequently, a positive signal corresponds to electrical activation spreading from WCT towards the respective electrode, and vice versa. However, the location of WCT has never been systematically investigated. Yet, a better understanding of WCT location could improve the interpretation of the precordial leads. This work aims to characterize the spatial expansion and location of the physical WCT i.e., the electrical potential defined by the WCT, during the P-wave on the body surface. Methods: An intensive analysis of body surface potential maps (BSPMs) during atrial depolarization in an in silico patient cohort and clinical data was conducted. Results: During the P-wave, the location of WCT was not stationary but the spatial extent and location varied across time as well as across individuals. Four distinct spatial patterns of WCT distribution on the body surface were identified in silico, and three of these were found in the clinical cohort. WCT signals agreed with BSPM signals at commonly assumed positions of WCT only for a small fraction of the P-wave. Conclusion: The spatial extension and location of WCT changes during the P-wave and thus should be considered when interpreting the precordial leads.

5

Noninvasive Hypokalemia Detection from Single-Lead AI-ECG: Development, Multicenter Validation, and Prospective Pilot Study in the Emergency Department

Tang, G.; Li, X.; Xiao, Y.; Wang, K.; Wu, M.; Wei, Z.; Yu, M.; Chen, X.; Hong, W.; Cheng, F.; Li, X.; Zhang, J.; Wu, X.; Hong, S.

2026-06-01 health informatics 10.64898/2026.05.23.26353774 medRxiv

Top 0.3%

1.2%

Show abstract

Hypokalemia is a common and potentially life-threatening electrolyte abnormality in emergency care, yet rapid noninvasive screening remains difficult in time-critical triage settings. We developed PocketED-K, a single-lead AI-ECG prescreening model initialized from ECGFounder, and evaluated it in retrospective multicenter cohorts and a prospective handheld pilot. Retrospective development and validation included 37,115 patients from MC-MED and MIMIC-ED, and the pilot enrolled 18 patients at Peking University First Hospital. Hypokalemia was defined as venous serum potassium < 3.5 mmol/L. PocketED-K achieved AUROCs of 0.8189 (95% CI 0.8172--0.8207) in internal testing, 0.8104 (95% CI 0.8092--0.8115) in temporal validation, and 0.7889 (95% CI 0.7692--0.8074) in independent external validation; external negative predictive value was 0.9911 (95% CI 0.9895--0.9925). Higher predicted risk was associated with ST-segment depression, T-wave flattening or inversion, and relative U-wave prominence. The prospective handheld pilot provided an initial signal of workflow feasibility in real-world acquisition. These findings support single-lead AI-ECG as a low-burden prescreening tool to prioritize potassium testing in emergency care.

6

Use of large language models by academic hospitalists: results of a multicenter survey

Bressman, E.; Auerbach, A.; Keniston, A.; Jens, C.; Ranji, S.

2026-05-29 health systems and quality improvement 10.64898/2026.05.27.26353610 medRxiv

Top 0.3%

1.0%

Show abstract

Introduction: The use of artificial intelligence (AI) by clinicians has increased rapidly in recent years, with large language models (LLMs) emerging as tools that can equal clinician diagnostic performance in simulated settings. However, limited data exist regarding physicians use of LLMs in real-world clinical practice. This study aimed to evaluate the frequency of LLM use among practicing hospitalists, identify which LLMs are most commonly utilized, and assess hospitalists' perceptions of the benefits and limitations of LLM use in clinical care. Methods: We conducted a cross-sectional survey study of academic hospital medicine faculty across 8 institutions within the Hospital Medicine Reengineering Network (HOMERuN), a collaborative research consortium. Eligible participants included hospitalists practicing within participating HOMERuN sites during the study period. The survey assessed the frequency of LLM use, types of LLMs used, clinical applications, and physician perceptions regarding usefulness, efficiency, and concerns associated with LLM adoption. Results: 170 respondents (67.1%) reported ever using an LLM in clinical practice. Among LLM users, OpenEvidence was the most used tool (88.9%), followed by ChatGPT (58.5%), Google Gemini (26.9%), and Microsoft Copilot (20.5%). Only a minority of hospitalists reported using LLMs daily while seeing patients. The most common use cases of LLMs were answering diagnostic (77.1%) and management (77.6%) questions. A majority also reported using LLMs to identify or summarize primary literature (60.0%). Lack of trust in outputs (49.8%), uncertainty around institutional policies (48.6%), and lack of access to secure applications (43.1%) were cited as the most frequent barriers to using LLMs in practice. Discussion: The use of LLMs in clinical practice is already widespread, though regular or daily use is not yet typical. Concerns regarding reliability, patient privacy, and safe integration into clinical workflows remain significant barriers to broader adoption. The responsible implementation of LLMs in hospital medicine will require addressing these barriers.

7

Neonatal EEG network activity associates with 2-year neurodevelopment after perinatal asphyxia

Syvalahti, T.; Tokariev, M.; Nevalainen, P.; Tuiskula, A.; Metsaranta, M.; Haataja, L.; Vanhatalo, S.; Tokariev, A.

2026-05-27 pediatrics 10.64898/2026.05.26.26354098 medRxiv

Top 0.4%

0.8%

Show abstract

Abstract Background Prediction of long-term neurodevelopmental outcomes remains challenging after perinatal asphyxia. Here, we studied whether computational metrics of brain function derived from neonatal EEG are associated with long-term neurodevelopment in infants with perinatal asphyxia. Methods Total of 36 term-born infants with perinatal asphyxia with or without hypoxic-ischemic encephalopathy were studied with neonatal multichannel electroencephalography (EEG). We computed local EEG amplitudes and phase-amplitude coupling (PAC), as well as large-scale functional cortical networks estimated using amplitude-amplitude correlations (AAC) and phase-phase correlations (PPC). These EEG-derived markers were tested for associations with neurodevelopmental outcomes at two years, assessed using the Griffiths Scales of Child Development, 3rd edition (GMDS-III). Results EEG amplitudes showed positive associations with GMDS-III Foundations of Learning and General Development scores across most electrodes during quiet sleep, with the strongest effects observed at frontal and central regions (r = 0.44-0.66). PAC showed negative associations with the same scores mainly over parietal and temporal regions (r = -0.45 to -0.55). Cortical AAC networks demonstrated the most robust and widespread negative associations in all frequency bands during quiet sleep (r = -0.47 to -0.54), with 70-72% of connections significant in high delta frequency. In turn, PPC networks showed frequency-selective and more spatially constrained negative associations during quiet sleep (r = -0.48 to -0.53), involving 5-12% of the network. Conclusions Both local and network-based metrics in the newborn brain show significant association with neurodevelopmental outcome at 2 years after perinatal asphyxia.

8

Sequential application of time-stratified demographic, vital, clinical-laboratory and microbiology variables for accurate and rapid identification of sepsis

Navalkar, K. A.; Garnacho-Montero, J.; Canton-Bulnes, M. L.; Garcia-Garmendia, J. L.; Estella, A.; Fernandez-Galilea, A.; Blanco, I.; Estecha-Foncea, M. A.; Gordillo-Resina, M.; Rodriguez-Gomez, J.; Pineda-Capitan, J. J.; Martinez-Fernandez, C.; Escoresca-Ortega, A.; Amaya-Villar, R.; Mora-Ordonez, J.; Gonzalez-Soto, S.; Gutierrez-Pizarraya, A.; Balk, R.; Miller, R. R.; Burke, J. P.; Patel, G.; Parada, J. P.; Schultz, M. J.; Scicluna, B. P.; Blodget, E.; Kumar, S.; Sampson, D.; Yager, T. D.; Davis, R. F.; Cermelli, S.; Brandon, R. B.

2026-05-29 intensive care and critical care medicine 10.64898/2026.05.27.26354135 medRxiv

Top 0.5%

0.5%

Show abstract

Background: Accurate early identification of sepsis remains a major clinical challenge due to its heterogeneous presentation and overlap of clinical signs with the non-infectious systemic inflammatory response syndrome (SIRS). Timely differentiation is crucial for improving patient outcomes, meeting sepsis bundle requirements and reducing inappropriate antimicrobial use. We hypothesized that clinical and laboratory data available within the first 3 hours of patient presentation could be used to identify patients with sepsis to an actionable level of accuracy, in lieu of traditional microbiology results which would not become available until at least 12-24 hours. Data from two independent studies were used to quantify the diagnostic value of demographic, vital, clinical-laboratory, and microbiological data available at three time points for distinguishing retrospectively diagnosed critically ill patients with either sepsis or non-infectious SIRS. A particular focus of this work was an assessment of the utility of SeptiCyte RAPID (Immunexpress Inc., Seattle, Washington, USA) as an aid to sepsis diagnosis, producing actionable data within 1 hour. Methods: Data from two independent study cohorts were analysed. The 510k cohort consisted of 419 adult patients in intensive care (ICU) (MARS, VENUS, and NEPTUNE trials). The Andalusian cohort consisted of 353 ICU patients from the PANGEA study. Logistic regression models, selected by a greedy search algorithm and validated by repeated cross-validation, were used to determine the contributions of different variables to diagnostic accuracy. Diagnostic performance was quantified by area under the receiver operating characteristic curve (AUC). Results: For the 510k cohort, a baseline AUC of 0.69-0.73 was observed using 5-7 vital and demographic variables assessed immediately upon presentation (time T1). The addition of clinical-laboratory variables, in particular SeptiCyte RAPID, within 1-3 hours post-presentation (time T2) increased the AUC to 0.83-0.85). Finally, the addition of microbiological data 12-24 hours post-presentation (time T3) further improved the AUC to 0.90-0.91. Similar results were obtained for the Andalusian cohort. AUC values at the three time points were as follows: At time T1, AUC = 0.67 based solely on vital signs and demographics; at time T2, AUC = 0.87 based on vitals + demographics + SeptiCyte RAPID or other clinical laboratory data; at time T3, AUC = 0.93 based on vitals + demographics + SeptiCyte RAPID or other clinical laboratory data + microbiology results). For both cohorts, the most significant variables included temperature, mean arterial pressure, respiratory rate, suspected infection site; SeptiCyte RAPID, procalcitonin, confirmed bacterial infection and positive blood culture confirmation. Conclusions: Accuracy of identification of sepsis increases markedly as demographics and vital signs are supplemented with clinical-laboratory information, and ultimately with microbiological culture results. The fastest improvement occurs within the first 3 hours when laboratory data, and in particular SeptiCyte RAPID results, become available. Integrating rapid host-response testing with SeptiCyte RAPID into time-based diagnostic frameworks may enhance early sepsis recognition, improve antimicrobial stewardship, and support guideline-driven clinical decisions.

9

Bridging Acoustic and Semantic Spaces for Interpretable Voice Scoring via Zero-Shot Semantic Expansion

Hsiao, C.; Cheng, Y.-R.; Yang, C.-Y.; Hsu, F.-S.

2026-06-01 health informatics 10.64898/2026.05.29.26354442 medRxiv

Top 0.5%

0.5%

Show abstract

Subjective auditory-perceptual evaluation and uninterpretable deep learning models limit the clinical assessment of voice disorders. This study proposes a two-phase zero-shot framework to evaluate voice pathology. First, an Audio Spectrogram Transformer is fine-tuned on the Perceptual Voice Quality Database to generate an acoustic latent space. Second, Orthogonal Procrustes analysis maps these acoustic embeddings directly onto the semantic space of a pre-trained Sentence Transformer. The geometric alignment produced continuous semantic axes that outperformed a supervised machine learning baseline in regressing clinician-rated GRBAS (Grade, Roughness, Breathiness, Asthenia, and Strain) severity scales. Furthermore, these axes correlate with traditional acoustic measures, including Harmonics-to-Noise Ratio and local jitter, while remaining robust when applied to aperiodic signals by not requiring fundamental frequency extraction. Most importantly, the model achieved zero-shot semantic expansion, successfully evaluating voices using an untrained, natural clinical vocabulary beyond the GRBAS scale. External validation on the Voice ICarus Database confirmed cross-corpus stability and demonstrated the capacity for zero-shot differential phenotyping of specific etiologies, such as hypokinetic dysphonia and reflux laryngitis. By bridging acoustic and semantic latent spaces, this framework offers an objective, continuous, and transparent metric for evaluating voice quality using voice descriptive vocabulary.

10

A Consensus-Driven Stacking Ensemble Framework for Interpretable Cardiovascular Risk Prediction and Clinical Deployment

Sozol, S. S.; Dev Nath, B. C.; Fahim, F. M. S.; Suzana, N. N.; Mirza, J. F.; Ahmmed, S.; Zohra, F.-T.; Zafr, A. H. A.; Uddin, M. N.; Mondal, M. R. H.; Hoque, A. S. M. L.

2026-05-26 health informatics 10.64898/2026.05.18.26352989 medRxiv

Top 1.0%

0.2%

Show abstract

Machine learning (ML) is being considered to help diagnose cardiovascular diseases (CVD). Still, challenges like inconsistent and limited datasets, limited infrastructure, and global inequalities lead to the need for a reliable and practicable ML solution. This paper presents an ML-driven framework for predicting CVD risk scores and classifying status. Several data preprocessing techniques, including multiple imputation by chained equations (MICE), outlier removal, are considered. In addition, hyperparameter tuning is performed with the GridSearchCV tuning technique. Moreover, a consensus-driven five-feature selection method is applied to identify optimal predictors. The dataset used in this study contains healthcare records related to future CVD risk scores, comprising 1,529 patient records with 22 features. The optimized stacked ensemble model is applied to the dataset and achieves a cross-validated coefficient of determination value of 98.13% for CVD risk score regression. Comparative evaluation with other ML models confirmed improved accuracy, efficiency, and interpretability. The explainable AI technique SHAP is applied to interpret predictions and highlight key risk factors. Moreover, a deployment-ready web platform with multi-role access has been developed that demonstrates clinical applicability. The proposed framework offers a reliable and interpretable tool for early detection of CVD and personalized risk assessment. In the future, this work can be extended to integrate longitudinal data, medical imaging, and deep learning to improve generalizability and strengthen real-world impact.

11

An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv

Top 1%

0.2%

Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [≤] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [≤] 40%. After fine-tuning on less than 10% of external data, LVEF [≤] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

12

Non-inferiority of a red-blood-cell--only transfusion strategy compared with balanced resuscitation in adults with massive gastrointestinal haemorrhage: a propensity-score-weighted cohort study

Bahar, B.; Sweeney, J. D.; Nixon, C.

2026-05-26 gastroenterology 10.64898/2026.05.25.26354037 medRxiv

Top 1%

0.2%

Show abstract

Background. Balanced (1:1:1) transfusion of red blood cells (RBCs), plasma, and platelets is the standard of care in trauma-induced massive haemorrhage, where early coagulopathy is a defining feature. In gastrointestinal (GI) haemorrhage this physiology is non-prominent, and whether plasma and platelets provide benefit when [≥] 10 RBC units are required within 24 hours is unknown. Objective. To test whether a red-blood-cell-only (RBC-only) transfusion strategy is non-inferior to a balanced (Balanced) strategy for in-hospital mortality in adults meeting massive-transfusion criteria for GI haemorrhage. Design. Single-centre retrospective cohort of 559 adult massive-transfusion encounters (536 patients; 2021-2025) with a primary admitting diagnosis of upper, lower, or unspecified GI haemorrhage. Exposures were RBC-only versus Balanced (RBCs with any plasma and/or platelets). The primary outcome was in-hospital mortality, with a pre-specified 5-percentage-point (pp) non-inferiority margin on the absolute risk difference and a 3-pp sensitivity margin. Analysis used augmented inverse-probability-of-treatment weighting (AIPTW) with bootstrap inference (2,000 resamples by patient). Five pre-specified sensitivity analyses were performed. Results. 505 encounters (90.3%) received RBC-only and 54 (9.7%) received Balanced transfusion. The AIPTW risk difference for in-hospital mortality (RBC-only - Balanced) was -19.8 pp (95% CI -68.1 - -2.2 pp). Non-inferiority was demonstrated at both the primary 5-pp and the more stringent 3-pp margins. Five pre-specified sensitivity analyses, (1) a propensity-score matched cohort, (2) a complete-case model incorporating INR, (3) a broader GI diagnosis set (n = 749), (4) a first encounter per patient restriction, and (5) E-value bound analysis were concordant with the primary estimate. Conclusion. In this propensity-score-weighted cohort of adults with massive GI haemorrhage, an RBC-only transfusion strategy was non-inferior to a balanced strategy for in-hospital mortality at both 5-pp and 3-pp margins. The findings support individualized use of plasma and platelets in GI haemorrhage rather than reflexive application of the 1:1:1 trauma protocol; prospective confirmation is warranted.

13

Peri Operative deLta rEnin ConcentrATion (POLECAT) Study Protocol and Analysis Plan

Boyer, N.; Haider, S.; Piercy, C.; Zarbock, A.; Samuels, T. L.; Papadopoulou, A.; Forni, L. G.; Creagh Brown, B.

2026-05-27 intensive care and critical care medicine 10.64898/2026.05.26.26352884 medRxiv

Top 1%

0.2%

Show abstract

Background: Post-operative hypotension and vasoplegia are well recognised following cardiac surgery but remain poorly characterised after major non-cardiac surgery, despite associations with acute kidney injury (AKI), cardiovascular complications, and increased mortality. Dysregulation of the renin angiotensin aldosterone system (RAAS) may underpin haemodynamic instability in this setting, yet data in abdominal surgery are limited. Objectives: The POLECAT (Perioperative delta Renin) study aims to determine whether changes in circulating renin concentration (delta renin) from pre-operative baseline to the early post-operative period are associated with post-operative vasoplegia in patients undergoing major abdominal surgery requiring intensive care admission. Methods: POLECAT is a single-centre, prospective observational study conducted at a UK tertiary referral hospital. Adult patients undergoing planned or emergency abdominopelvic surgery with anticipated intensive care admission are enrolled. Blood samples are obtained pre-operatively, within four hours post-operatively, and on post-operative day one to measure renin and a panel of endothelial, renal, and immune biomarkers. The primary outcome is post-operative vasoplegia, defined as the requirement for a vasopressor infusion at 08:00 on post-operative day one. Secondary outcomes include alternative vasoplegia definitions, AKI (KDIGO criteria), vasopressor burden, organ dysfunction, cardiovascular complications, length of stay, and mortality. Multivariable regression, receiver operating characteristic analyses, and predefined subgroup analyses will be performed, with sensitivity analyses addressing missing data. Conclusions: This study will clarify the relationship between peri-operative RAAS dysfunction and vasoplegia following major abdominal surgery. Findings may support biomarker-guided risk stratification and inform future interventional trials targeting haemodynamic instability in this high-risk population.

14

From CCTA to Surgical Strategy: An Integrated AI Framework for Patient-Specific Coronary artery bypass grafting Planning

Rezaeitaleshmahalleh, M.; Masoumi, S.; Debalme, E.; Sundt, T. M.; Aranki, S. F.; Shin, B.; Nezami, F. R.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354400 medRxiv

Top 1%

0.2%

Show abstract

Background: Coronary artery bypass grafting (CABG) remains the standard of care for complex multivessel and left main coronary artery disease. However, current preoperative planning remains largely subjective, relying on qualitative interpretation of coronary CT angiography (CCTA), operator-dependent stenosis grading, and fragmented multi-software workflows. Invasive fractional flow reserve (FFR), the reference standard for physiologic lesion assessment, is infrequently acquired preoperatively, leaving distal anastomosis planning without an objective hemodynamic basis. Methods: We developed a fully automated, AI-powered platform that converts routine CCTA into a patient-specific CABG planning workflow through five integrated modules: nnU-Net based segmentation of coronary lumen and calcification; quantitative morphological and topological characterization generating more than thirty descriptors; automated stenosis detection using a local reference-radius formulation; a nine-point composite scoring framework for distal anastomosis site selection incorporating luminal caliber, landing-zone length, calcification burden, distal perfusion reserve, and bifurcation proximity; and interactive virtual graft construction coupled to a distributed reduced-order solver for pre- and post-bypass FFR estimation. Results: Lumen segmentation achieved a mean Dice similarity coefficient of 0.96 {+/-} 0.01, whereas calcium segmentation achieved 0.73 {+/-} 0.15 on the held-out cohort. Platform-derived FFR demonstrated strong agreement with invasively measured FFR (r=0.96, mean absolute relative difference 1.73 {+/-}1.42%) across the evaluated lesions, supporting the physiologic validity of the reduced-order hemodynamic solver. End-to-end analysis from raw CCTA to hemodynamic assessment and virtual graft planning was completed in approximately seven minutes per case on a standard workstation, representing a substantial reduction in processing time compared with conventional multi-tool and CFD-based workflows. Conclusions: The proposed platform demonstrates the feasibility of rapid, reproducible, and physiology-informed CABG planning using routine CCTA. By integrating anatomical characterization, automated target-site analysis, virtual graft construction, and reduced-order hemodynamic assessment into a single workflow, the framework provides objective, quantitative surgical decision support compatible with routine clinical workflows. Keywords: Coronary artery bypass grafting (CABG); Fractional flow reserve (FFR); Coronary CT angiography (CCTA); Surgical planning

15

ERBB4 deficiency promotes atrial myopathy underlying the atrial fibrillation substrate

Yamaguchi, N.; Santucci, J.; Hong, S. J.; Ferrena, A.; Schlamp, F.; Willett, D.; Casdin, C. J.; Park, P. S.; Lin, X.; Xiao, J.; Hall, S.; Barnard, J.; Achter, J.; Kanhert, K.; Lundby, A.; Chung, M. K.; Van Wagoner, D. R.; Park, D. S.

2026-05-27 cardiovascular medicine 10.64898/2026.05.26.26354173 medRxiv

Top 1%

0.2%

Show abstract

Background Atrial fibrillation (AF) is a leading cause of stroke, cardiovascular morbidity, and mortality. Atrial myopathy, characterized by progressive metabolic, electrical, and structural changes, creates the arrhythmogenic substrate that drives AF. Defining the key drivers of atrial myopathic processes is essential for targeted therapies that can mitigate AF progression. Here we explore how reduced ERBB4 expression contributes to the development of left atrial myopathy. Methods We analyzed the Cleveland Clinic Biobank to compare left atrial ERBB4 levels in patients grouped by AF diagnosis. To investigate the impact of reduced ERBB4 levels on atrial tissue substrate, we created mouse models of cardiac-specific Erbb4 deficiency using Mlc2a (myosin light chain 2a)-Cre. Comprehensive physiological assessments were performed. Transcriptomic analyses of the left atrium were performed in an Erbb4 haploinsufficient mouse model and compared with human atrial datasets. Molecular validation of key dysregulated pathways was performed. Results We found that left atrial ERBB4 levels are reduced in patients with AF. Adult cardiomyocyte-specific Erbb4 heterozygous (Erbb4fl/+;Mlc2a-Cre) mice exhibited prolonged P-wave duration in the absence of ventricular dysfunction. Left atrial transcriptomic analysis in Erbb4 haploinsufficient mice showed upregulation of pathways related to fibrosis, apoptosis, and coagulation, and downregulation of pathways related to fatty acid metabolism and mitochondrial function, mirroring changes observed in pressure overload mouse models. A cross-species transcriptomic comparison revealed significant overlap between ERBB4-correlated gene expression and functional pathways in adult human atria and mice with Erbb4 haploinsufficiency. Validating the transcriptomic data, protein and functional assays demonstrated increased fibrosis, apoptosis, and oxidative stress in the mutant left atrial tissue. Conclusion Left atrial ERBB4 levels are reduced in AF patients. A mouse model of Erbb4 deficiency and human atrial transcriptomic analyses highlight a role for ERBB4 in supporting normal atrial metabolism while protecting against inflammation, apoptosis, and fibrosis.

16

Segmental Lung Sound Analysis in Obstructive Lung Diseases Using Electronic Stethoscope; a protocol to establish an acoustic repository

Anuradha, H.; Yasaratne, D.; GMRI, G.; Parakrama, E.; Severin, R.

2026-05-28 respiratory medicine 10.64898/2026.05.27.26354263 medRxiv

Top 1%

0.2%

Show abstract

Introduction Obstructive lung diseases (OLDs) are responsible for high rates of illness and death worldwide. Inflammation, chronic airflow limitation, and bronchial remodeling occur in OLD and eventually result in the unique respiratory sounds. Despite its subjective and having low reproducibility, still traditional auscultation using a manual stethoscope is the main method used to identify the lung sounds. Nevertheless, the combination of recent advancements in digital stethoscopes and AI (Artificial Intelligence) has permitted the objective measurement of lung sounds. Nevertheless, there is a lack of standardized, region-specific databases for AI training and validation. Even though lung sound classification is an emerging aspect in research and telerehabilitation the lobar wise acoustic pattern is still novel due to lack of prevailing database to train AI models. Identifying this gap this study aims to develop an acoustic repository and analyze the data using segmental lung sounds from patients with OLDs and healthy controls through an electronic stethoscope. Methods and analysis This is a cross sectional observational study involving 120 participants (60 OLD patients and 60 healthy controls). Lobar wise acoustic signals will be captured using an electronic stethoscope in healthy and diseases population. The data will be analyzed using Audacity software for annotations and then it will be used for feature extraction and statistical analysis. The acoustic features extracted through Audacity, will include frequency, intensity, pitch, and root mean square (RMS) energy. Repeated measures ANOVA will be applied to compare mean sound intensities across lung segments while Pearson correlation will be used to assess associations with body composition parameters. The data will then be standardized for AI-based diagnostic applications. Ethics and dissemination The study is being reviewed from the Ethics Review Committee, Faculty of Medicine, University of Peradeniya (2025/EC/87) will be sought. Informed consent will be obtained in writing. The dissemination of results will take place through peer-reviewed publications and the creation of a public database containing lung sounds from the region.

17

Multi-Agent AI for Chest Radiography: A Sequential Segmentation and LLM-Driven Consultative Tool for Medical Training

Kurt, F.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354432 medRxiv

Top 1%

0.2%

Show abstract

Background: Traditional diagnostic models lack explainability, while multimodal language models prone to hallucination remain unsafe for medical education. An interactive, risk-free artificial intelligence framework is required to serve as a reliable clinical mentor for radiology trainees. Methods: We propose a multi-agent architecture decoupling deterministic image analysis from generative consultation. Specialized computer vision models perform anatomical localization and pathological segmentation. These quantitative outputs are synthesized into a structured payload, which grounds a locally hosted large language model (LLaVA 7B) using strict prompt guardrails and prerequisite protocols. Results: The system effectively eliminates visual hallucinations by intercepting unanchored queries. The artificial intelligence tutor successfully contextualizes spatial anomalies and baseline metrics, generating accurate conversational explanations and formally structured radiology reports while strictly enforcing medical safety disclaimers. Discussion and Conclusion: By anchoring language generation exclusively to verified algorithmic realities, this framework transforms opaque diagnostic models into safe, interactive educational simulators. This establishes a highly reliable paradigm for integrating explainable artificial intelligence into medical training.

18

Breath volatile profiling reveals a diagnostic signature of MASLD in children

Berna, A. Z.; Panganiban, J.; Liu, Y.; Logan, J.; Russo, P.; Aryal, A.; Hafertepe, K.; Abu-Alreesh, S.; DeBosch, B.; Stoll, J.; John, A. R. O.

2026-05-27 gastroenterology 10.64898/2026.05.26.26353794 medRxiv

Top 1%

0.1%

Show abstract

Background & Aims: Metabolic Dysfunction Associated Steatotic Liver Disease (MASLD) is the leading cause of chronic liver disease in children. However, accurate, noninvasive diagnostic tools remain limited. Current screening methods are invasive or lack sensitivity. Breath-based volatile organic compound (VOC) analysis offers a simple approach with potential for point of care screening. This study aimed to identify and validate breath VOC signatures of pediatric MASLD. Approach & Results: We conducted a prospective IRB approved cohort study at the Childrens Hospital of Philadelphia (CHOP). Children aged between 7 and 20 years with MASLD (n=22), as defined by hepatic steatosis either by liver biopsy or imaging and 1 cardiometabolic risk factor, and a control group without MASLD (n=20) were enrolled. Breath samples were collected using a standardized protocol and analyzed by untargeted comprehensive two-dimensional gas chromatography-mass spectrometry (GCGCMS). Machine learning and unsupervised clustering were applied to identify discriminatory VOCs and assess heterogeneity. Untargeted GCGCMS analysis identified a distinct breath VOC signature in children with MASLD compared with non MASLD controls. A Random Forest model achieved a sensitivity of 73% and specificity of 65%, with AUC of 0.84. The VOC 2,4-dimethyl-1-heptene demonstrated strong diagnostic performance in the discovery cohort with a sensitivity of 85%, specificity of 77% and an AUC of 0.81. Unsupervised clustering revealed four MASLD subgroups with distinct volatile phenotypes associated with differences in liver enzymes and metabolic parameters. External validation in a second pediatric cohort confirmed reproducible reductions in o/p-xylene in subjects with MASLD. Conclusions: Pediatric MASLD is associated with a reproducible breath VOC signature identified by untargeted GCGCMS. These findings support breath analysis as a scalable, noninvasive screening and stratification tool for pediatric MASLD and warrant validation in larger, longitudinal studies.

19

SeGA-GNN: Semantically Gated Augmented Graph Neural Networks for Wearable-Based Emotion Detection

Kurt, F.; Subasi, S. N.; Yakisan, E. S.; Subasi, A.

2026-06-01 health informatics 10.64898/2026.05.29.26354434 medRxiv

Top 1%

0.1%

Show abstract

Background: Wearable technologies enable scalable and continuous monitoring of emotional states through passive sensing of physiological and behavioral signals. However, conventional learning approaches often struggle to model the complex temporal, contextual, and relational dependencies underlying human emotions. To address these limitations, we propose a graph-based framework that represents multimodal wearable observations as heterogeneous knowledge graphs enriched with semantic information derived from Large Language Models (LLMs), enabling richer contextual understanding beyond raw sensor measurements. Methods: We constructed a heterogeneous knowledge graph using multimodal Fitbit physiological signals and affective self-report data collected from 45 users. Framing mood prediction and emotion detection was formulated as both binary and ternary node classification tasks. We evaluated five baseline heterogeneous Graph Neural Network (GNN) architectures and compared them with the proposed Semantically Gated Augmented Graph Neural Network (SeGA-GNN) framework, which dynamically integrates LLM-generated semantic embeddings into graph representations through a gated cross-modal fusion mechanism. Results: The baseline GNN models achieved strong performance, with classification accuracies ranging from 0.7525 to 0.9739 for binary classification and 0.6249 to 0.9699 for ternary classification. The proposed SeGA framework consistently improved predictive performance across most architectures. In particular, semantic augmentation transformed the HAN model from moderate baseline performance into near-perfect emotion recognition capability, achieving SeGA-HAN Accuracy = 0.9988 and AUC = 1.0000 for binary classification and Accuracy = 0.9979 and AUC = 1.0000 for ternary classification. Discussion and Conclusion: Integrating LLM-derived semantic contextualization into heterogeneous graph learning enables effective modeling of contextual information that is not directly captured by wearable physiological signals alone. The proposed SeGA-GNN framework demonstrates that adaptive semantic fusion substantially improves the accuracy, robustness, and interpretability of wearable-based emotion detection. These findings establish a promising direction for next-generation wearable affective computing systems and intelligent emotion-aware applications.

20

Ruling In and Ruling Out Sepsis Using Likelihood Ratios of a Host Response Assay

Navalkar, K. A.; Wani, P.; Davis, R. F.; Cermelli, S.; Dietrich, M.; von der Forst, M.; Becker, S. L.; Benthien, S.; Baumann, E.; Zeiner, C.; Lepper, P. M.; Garnacho-Montero, J.; Canton-Bulnes, M. L.; Fernandez-Galilea, A.; Luis Garcia-Garmendia, J. L.; Estella, A.; Miller, R. R.; Schultz, M. J.; Rothman, R.; Burke, J.; Patel, G.; Parada, J.; Yager, T. D.; Brandon, R. B.

2026-06-01 intensive care and critical care medicine 10.64898/2026.05.29.26354374 medRxiv

Top 2%

0.1%

Show abstract

Overview: SeptiCyte RAPID is an FDA-cleared gene expression test that quantifies host immune response to aid in the diagnosis of sepsis. The test yields a score (the SeptiScore) ranging from 0-15, distributed across four bands (1-4) based on increased likelihood of sepsis. Each band can be characterized by average positive and negative likelihood ratios (LR+, LR- respectively) for the discrimination of sepsis versus the non-infectious systemic inflammatory response syndrome (SIRS). Methods: A retrospective analysis of prospectively collected data from a combined cohort of critically ill patients suspected of sepsis (N=889), recruited across 19 hospitals in the USA and Europe. The analysis quantified the LR+ and LR- parameters as a function of SeptiScore, for discrimination of sepsis vs. SIRS in patients admitted to ICU. Hypotheses: (1) The likelihood ratio (LR) framework provides a clinically useful interpretive approach that complements the previously used SeptiScore banding scheme; (2) Low Band 1 SeptiScores are associated with sufficiently small LR- to support the use of SeptiCyte RAPID as a rule-out test for sepsis; (3) High Band 4 SeptiScores are associated with sufficiently large LR+ to support the use of SeptiCyte RAPID as a rule-in test for sepsis; and (4) SeptiScore-derived LR+ and LR- values can be combined with estimates of pre-test probability (derived from patient characteristics and/or other diagnostic tests) to generate individualized, patient-specific post-test probabilities of sepsis. Results: The SeptiCyte RAPID test demonstrates strong diagnostic performance in distinguishing sepsis from SIRS. The likelihood ratios across different score bands provide clear clinical utility: the median LR+ was 3.26 (range 2.57-4.24) for Band 3, and 6.97 (range 4.35-15.57) for Band 4 providing evidence toward ruling in sepsis at high SeptiScores. Conversely, the median LR- was 0.16 (range 0.14-0.20) for Band 2 and 0.085 (range 0.014-0.16) for Band 1, providing evidence toward ruling out sepsis at low SeptiScores. A higher-resolution analysis of SeptiCyte RAPID performance confirmed these trends by evaluating LR+ and LR- at specific values within each band. The sepsis group was further stratified according to whether patients were classified as blood-culture positive (BC+) or blood culture negative (BC-), and the detailed LR+ and LR- analyses were repeated. A monotonic increase in likelihood ratio with increasing SeptiScore was consistently observed, independent of whether sepsis patients were culture-positive, culture-negative, or unstratified with respect to blood culture status. Conclusion: High SeptiScores have correspondingly high LR+ values, and low SeptiScores have correspondingly low LR- values, both of which may have clinical utility. High likelihood ratios for band 4 SeptiScores, which precede traditional microbiology results, may provide clinicians with early confidence of a sepsis diagnosis and microbiology diagnostic stewardship. Low likelihood ratios for band 1 SeptiScores may prompt clinicians to consider an alternate diagnosis to sepsis. Such results, obtained early in the diagnostic workup process, may lead to fewer missed diagnoses and more efficient use of hospital resources.